NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Model

Wang, Zheng; Wang, Yuke; Feng, Boyuan; Huang, Guyue; Mudigere, Dheevatsa; Muthiah, Bharath; Li, Ang; Ding, Yufei (July 2024, USENIX Association)

The deployment of Deep Learning Recommendation Models (DLRMs) involves the parallelization of extra-large embedding tables (EMTs) on multiple GPUs. Existing works overlook the input-dependent behavior of EMTs and parallelize them in a coarse-grained manner, resulting in unbalanced workload distribution and inter-GPU communication. To this end, we propose OPER, an algorithm-system co-design with OPtimality-guided Embedding table parallelization for large-scale Recommendation model training and inference. The core idea of OPER is to explore the connection between DLRM inputs and the efficiency of distributed EMTs, aiming to provide a near-optimal parallelization strategy for EMTs. Specifically, we conduct an in-depth analysis of various types of EMTs parallelism and propose a heuristic search algorithm to efficiently approximate an empirically near-optimal EMT parallelization. Furthermore, we implement a distributed shared memory-based system, which supports the lightweight but complex computation and communication pattern of fine-grained EMT parallelization, effectively converting theoretical improvements into real speedups. Extensive evaluation shows that OPER achieves 2.3× and 4.0× speedup on average in training and inference, respectively, over state-of-the-art DLRM frameworks.
more » « less
Full Text Available
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs

Wang, Weiyang; Khazraee, Moein; Zhong, Zhizhen; Ghobadi, Manya; Jia, Zhihao; Mudigere, Dheevatsa; Zhang, Ying; Kewitsch, Anthony (April 2023, 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23))

Full Text Available
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs

Wang, Weiyang; Khazraee, Moein; Zhong, Zhizhen; Ghobadi, Manya; Jia, Zhihao; Mudigere, Dheevatsa; Zhang, Ying; Kewitsch, Anthony (April 2023, 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23))

Full Text Available
Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy

Jahani, Majid; He, Xi; Ma, Chenxin; Mokhtari, Aryan; Mudigere, Dheevatsa; Ribeiro, Alejandro; Takac, Martin (January 2020, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics)
null (Ed.)
Full Text Available

Search for: All records